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HIGHLIGHTS 


• A multimodal stereo ranging system composed of a thermal camera and a visible camera is built. 

• A simple and reliable range computation method is presented when two different cameras are placed in parallel configuration. 

• The procedure to identify the parameters required for the range computation method is described. 

. The effect of the relative position of a certain target and reference points is also analyzed when only two reference points are used to identify the 
parameters. 
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Binocular stereo vision can provide geometric position information of the target, which enables one to 
track the moving object precisely. Although visible images are full of details of geometry and texture, 
it is difficult to detect moving objects in poor visibility. Appropriate fusion of infrared and visible images 
can combine the complementary information and obtain a better description of the scene, which will 
help in target detection and target localization. Considering the physical differences between thermal 
infrared cameras and visible cameras, a stereo ranging method in parallel camera configuration is pro¬ 
posed in this paper. The rotation center of a pan-tilt device is used as the origin of the coordinate system 
in the method. It could not only locate the object in poor visibility, but also determine the parameters 
essential in range computation in practical work, which is very difficult in traditional calibration meth¬ 
ods. Furthermore, the effect of the relative position of a certain target and reference points is analyzed. 
Experiments also proved the validity of the proposed method. 

© 2013 Elsevier B.V. All rights reserved. 


1. Introduction 

Monocular vision is simple because it could avoid image fusion, 
which means it can meet the requirement of real-time in visual 
surveillance. However, it can only provide two-dimensional infor¬ 
mation of the scene. On the other side, binocular vision can provide 
three-dimensional (3D) information of the object by shooting the 
object at the same time by two cameras placed in different posi¬ 
tions. Thus, binocular vision makes it possible to precisely track 
the moving object. However, it is a huge challenge to detect and 
track the moving object by two visible cameras in poor visibility, 
such as night vision, fog, rain, and other inclement weather. 

With the rapid development of science and technology, the 
application of multi-sensor technology is increasingly wide¬ 
spread. Multiple vision sensors are now widely used in visual sur- 
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veillance, improving the robustness and accuracy of visual 
surveillance system in all-weather conditions. A target ranging 
system based on visual and active infrared imaging in night vision 
was proposed in [1], An active infrared detection system is usu¬ 
ally consisted of an active infrared transmitter and a passive 
infrared sensor (receiver). The transmitter emits a beam of light 
into the detection zone. The light, which is reflected by the back¬ 
ground, returns to the receiver that constantly monitors the 
detection zone. Therefore, the effective range of an active infrared 
system is dependent on the power of the transmitter. Further¬ 
more, the transmitter and its power supply are not convenient 
in field. By contrast, a passive infrared detection system can work 
without transmitters. Only one thermal camera is needed in the 
passive system. The thermal camera creates images based on dif¬ 
ferences in surface temperature by detecting infrared radiation 
(heat) that emanates from objects and their surrounding 
environment. It can be used in imaging when there is insufficient 
visible light to see such as in night vision, as well as special 
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camouflage detection. However, the thermal image is poor in 
spatial resolution and texture which means a texture or an edge 
in a visible image is often missing in the thermal image. 
Using visible-infrared camera pairs is now attracting more 
attention because the combination of the information from these 
two sensors performs well in complementary situations. Although 
related work has been developed in visual-thermal fusion, most 
of them are focused on enhanced vision [2-4], A multimodal ste¬ 
reo vision system consisted of a thermal camera and a visible 
camera is able to make full use of grayscale and temperature to 
get the depth information of the object in poor visibility. There¬ 
fore, it contributes to precise tracking in all weather conditions 
[5,6], 

The intrinsic parameters of the lens (focal length and principle 
point), the extrinsic parameters of the lens (relative position of 
the optical centers), and the disparity information are indispens¬ 
able in stereo ranging. Therefore, the precision of ranging results 
greatly depends on the precision of the parameters calibrated. In 
a typical stereo vision system composed of two visible cameras, 
the calibration is usually carried out with the help of a certain 
calibration board such as a checkerboard pattern. With a matter 
of placing a checkerboard pattern in front of the camera, these 
parameters can be carefully calibrated. However, most of the 
existing calibration methods are complicated and time- 
consuming [7-12], It usually costs several hours to calibrate the 
parameters for just one camera. Furthermore, once the focal 
length changes or one of the cameras moves, it is necessary to 
re-calibrate the intrinsic/extrinsic parameters. So they are not 
appropriate in field. 

It is obvious that calibrating these parameters is in the first 
place to compute the range in the multimodal stereo vision sys¬ 
tem composed of a thermal camera and a visible camera of which 
intrinsic parameters are different. Meanwhile, the calibration 
method must be simple and reliable to be suitable for practical 
application. Furthermore, a thermal image is very different from 
a visible image. If the calibration method based on the calibration 
board is adopted, some extra care must be taken to ensure the 
calibration board looks similar in each modality. In [13], the mul¬ 
timodal stereo calibration was carried out with a heated calibra¬ 
tion board based on the typical calibration method. As mentioned 
above, although the calibration results can meet the requirement 
of precision, the procedure is too complicated to be applied in 
field. In [14], the thermal camera was calibrated with the help 
of a pair of calibrated visible cameras. However, it required ideal 
environmental conditions. 

In this paper, the range is computed to provide position infor¬ 
mation of the target for the tracking system, so operation speed 
is more important than precision. To solve these problems, a ste¬ 
reo ranging method in parallel camera configuration is proposed 
in this paper. The rotation center of a pan-tilt device is used as 
the as the origin of the coordinate system in the method. Based 
on this method, the line distance between the target and the 
rotation center of the pan-tilt device can be computed as well 
as the angle. The procedure is also shown to identify the param¬ 
eters used in this method. The method is simple and easy to cal¬ 
culate. Only geometry information of the pan-tilt devices and 
several reference points (at least two points) are needed to 
determine the parameters in the proposed method. The effect 
of the relative position of a certain target and reference points 
is also analyzed when only two reference points are used to 
identify the parameters. The region in which better ranging re¬ 
sult can be gotten is provided in the experiment. Finally, exper¬ 
iments proved that the results of the proposed method could 
meet the requirement of object tracking and locating in all 
weather conditions. 


2. Computing range 

In stereo vision, the relationship between a 3D point and its im¬ 
age projection is determined by the camera model. The simplest 
model of a camera is the pinhole camera model. In this paper, pin¬ 
hole model is used as the camera model for both visible camera 
and thermal camera. In addition, the position of the image plane 
and the position of the optical center are switched in the coordi¬ 
nate system described in this paper for convenience (see Fig. 1). 
As shown in Fig. 1 ,1 is the image plane, 0 is the optical center of 
the lens, 0 o {u o ,v o ) is the intersection of the optical axis and the im¬ 
age plane(the principle point), p(u,v) is the projection of the spatial 
point P(x c ,y c ,z c ) to image coordinates, and / is the focal length, 
which is precisely the distance from the optical center to the image 
plane for a idealized pinhole camera. Usually, the optical center 0 
is set as the origin of the camera coordinate system (CCS), the opti¬ 
cal axis is set as Z-axis, and the optical axis is perpendicular to the 
image plane. Based on this assumption, the relationship between a 
3D point P and its image projection p is given by: 

r«] r/ o u 0 i rx c i 

Zc U h»|o f v a L . (1) 

[ij [o o i J [z c \ 

2.1. Stereo ranging principle in parallel camera configuration 

According to the relationship of the optical axes of the right and 
left lenses, there are two methods of binocular stereoscopic camera 
with controllable vergence: toed-in configuration and parallel con¬ 
figuration. In toed-in configuration, these optical axes are made to 
cross at some point during the shooting: while in parallel configu¬ 
ration, the optical axes are parallel to each other. The latter is rel¬ 
atively easier in computing range so the calibration is easier. 
Therefore, this study computed range using a visible camera and 
a thermal camera which were arranged in parallel configuration. 

An example of parallel camera configuration is shown in 
Fig. 2(a). The camera coordinate system of the left camera Ci is 
0 1 XiyiZ 1 , and the camera coordinate system of the right camera 
C2 is 0 2 X2y 2 Z2. Let P be the target point, then p 1 and p 2 are the pro¬ 
jections of P to the image plane in the left and right camera view 
respectively. The intersection of O1P1 and O2P2 is P, and it is the 
one which is only confirmed. In the parallel configuration, each 
axis in OiXjyiZi is parallel to the correspondence axis in 0 2 X2y2Z2- 
The distance from Oi to 0 2 in X-axis is b, the interval of 0, and 
0 2 in Y-axis is e, and the interval of 0, and 0 2 in Z-axis is d (when 
Oj is above 0 2 , e > 0; when 0 2 is behind 0!, d > 0.). In the configu¬ 
ration described above, set the left camera coordinate system 


P(x c ,y c ,z c ) 
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OiXiyiZi as the world coordinate system (WCS) and the relation¬ 
ship between WCS and CCS is shown as follows: 

x w = Xd = x a + b 

y vl =y c i =y c2 + e (2) 

z w = z c i =z c2 -d 

In order to straightforwardly describe the geometric 
relationship in the parallel configuration, 3D view of the 
configuration shown in Fig. 2(a) is converted to 2D view. Fig. 2(b) 
shows X-Z plane projection of the geometry of parallel cameras. 
Suppose that the focal length of each camera is /, the principal 
point of Ci is (u„ \,v 0 \) and the principal point of C 2 is (u o2 ,v o2 ). 
The coordinate of P in the left camera coordinate system is 
(Xci.yci.Zci) and the coordinate of P in the right camera coordinate 
system is (Xc2.yc2.Zc2). (u,,^) is the coordinate of p, on the 2-D im¬ 
age plane /1 and (u 2 ,%) is the coordinate of p 2 on the 2-D image 
plane / 2 . In the configuration described above, the coordinates cen¬ 
tered on 0, of the left camera is selected as the coordinates in 
which the calculations will be done. In Fig. 2(b), the relationship 
between the image coordinates and the camera coordinates can 


be seen by similar triangles that: 


th-Uol =f^ 

(3) 

U2 ~ u ° 2 = fX £ri 

(4) 


(5) 

if 

(6) 


Thus, x ri , y ct and z cl can be solved by these equations: 


(vj - v„ t )[bf + d(u 2 - u o2 )] 
f(Uj -u 2 + u 02 -u 0l ) 

(7) 

(u, -u 0l )[bf + d(u 2 -u o2 )] 

/(Ui - u 2 + U„ 2 - U„i) 

(8) 

bf + d(u 2 - u o2 ) 

(Uj -u 2 + u 0 2 - u„ 1 )' 

(9) 


The expression of z ci shows that it can be solved by Eqs. (3) and (4) 
which are related to X-coordinates and Z-coordinates. In this paper, 
the range is computed to get the position information of the tracking 


object, so it is not essential to obtain the information in Y-axis espe¬ 
cially when the object only moves on the ground. Furthermore, the 
computational complexity can be reduced if the information in 
Y-axis is ignored. Therefore, only information in X-axis and Z-axis 
is considered in stereo ranging in our paper. The above is the stereo 
ranging principle in parallel camera configuration. 


2.2. Proposed ranging equations 

Eqs. (8) and (9) show that z cl is a function of/, u„ 1, u o2 , u 1, u 2 , b 
and d. The focal length/of visible camera and thermal camera can 
be obtained from each instruction and it is not difficult to choose 
the lenses with the same focal length, tq and u 2 can be observed 
from images or obtained through stereo matching. u o1 and u o2 is 
the x-coordinate of the principal point of each camera, and they 
should be on the center of the image plane in ideal conditions, b 
and d describe the geometric relationship of 0, and 0 2 , which 
means they are the extrinsic parameters. However, the optical cen¬ 
ter is not a well-defined, constant, physical location. It actually 
moves as the lens focuses and zooms, making it difficult physically 
to measure b and d. Most of the stereo ranging systems obtained b 
and d by an off-line calibration [9], As mentioned above, once the 
focal length changes or the camera moves, it is necessary to re¬ 
calibrate the extrinsic parameters. Furthermore, the classic calibra¬ 
tion procedure is complicated and time-consuming. 

The location of optical center is invisible and uncertain, which 
makes the measurement of b and d difficult. In contrast, the camera 
rotation center is a well-defined location, around which the pan¬ 
tilt device rotates the cameras. Therefore, the distance between 
the two rotation centers is easier to measure in practical use. In¬ 
stead of Oi, the optical center of the left camera, the left rotation 
center R, is selected as the origin of the coordinate system in this 
paper. Thus, every geometric relationship related to the optical 
centers can be transformed to the geometric relationship related 
to the rotation centers. Accordingly, the distance from P to R, 
rather than Oi is the distance to be solved in this paper. 

In the presented method, two cameras are separately mounted 
on two pan-tilt devices with the same configuration. R lt the rota¬ 
tion center of the left camera is set as the origin of XR,Z. As shown 
in Fig. 3, the rotation center R f does not coincide with the optical 
center 0,{i = 1,2. The parameter with the subscript i refers to the 
left camera when i = 1 and refers to the right camera when i = 2, 
similarly hereinafter.). In X-Z plane, the distance between R, and 
0, is m ; in X-axis and n, in Z-axis. The distance between R, and R 2 
is r in X-axis and t in Z-axis. In XR]Z, when 0, is to the right of R„ 
m,- > 0; when 0, is behind R f , n f > 0; when 0 2 is behind 0,, f > 0. x 0 
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is the distance from P to R i in X-axis and Zoi is the distance from P 
to R 1 in Z-axis. In the configuration described above, when the opti¬ 
cal axes are parallel the relationship between b and r and the rela¬ 
tionship between d and t can be described as follows: 

b = r+m 2 -m 1 (10) 


2.3. Parameter identification 


The parameters in Eqs. (16) and (17) such as f, u oi , u,-, r and t can 
be obtained through the way mentioned in last section. However, 
m, and n, are still unknown. These four parameters must be identi¬ 
fied to figure out x 0 and z 01 . According to Eqs. (1 6) and (17 ), x 0 and 
Zoi can be solved when all of the other parameters are already 
known. Alternatively, m f and n,- can be solved when all of the other 
parameters are known too. 

For a fixed ranging system in which/, u oi , r and t are known and 
constant, a point of which coordinates on two image planes and 
coordinates in XR^Z are already known can provide two equations, 
and two known points can provide four equations. Thus a 
minimum of two points are needed to solve for the four 
parameters. 

Suppose there are two spatial reference points Qi and Ch and 
their coordinates on two image planes and coordinates in XR,Z 
are already known. x n ( 1) is the distance from Q, to R, in X-axis 
and Zoi(l) is the distance from Qi to R, in Z-axis. Ui(l) is the 
x-coordinate of Q, on the left image plane and u 2 (l) is the x-coor- 
dinate of Qi on the right image plane. x 0 (2) is the distance from Q2 
to Rj in X-axis and z 0 i(2) is the distance from Q2 to R, in Z-axis. 
Ui(2) is the x-coordinate of Q2 on the left image plane and u 2 (2) 
is the x-coordinate of Q2 on the right image plane. Thus the follow¬ 
ing equations can be obtained: 




d = t + n 2 -ni. (11) 

Accordingly, the relationship between x 0 and x cl and the relation¬ 
ship between z 0 1 and z cl can be described as follows: 
x cl =x 0 -mi (12) 


Zci = Zoi + th. 

Therefore, in XR,Z, Eqs. (3) and (4) 

Ul -u 01 J (x °- mi) 

Zoi + til 


(13) 

transformed as follows: 

(14) 


u 2 - Uo2 


f(x 0 -r-m 2 ) 
Zoi + £ + n 2 


Thus, x 0 and z 0 \ can be solved from these equations: 

= (U1 - u 0l )[f(r + m 2 - mi) + (u 2 - u„ 2 )(t + n 2 - nQ] 

/(til - u 2 + u„ 2 - u 0l ) 

+ m 1 


(15) 


(16) 


, /(r + m 2 -m 1 ) + (u 2 -u o2 )(t+n 2 -ni) M 

(Ui - u 2 + u o2 - u 0l ) v ' 

Further, the line distance from P to R] and angle between the 
connecting line of P and R, and X-axis can be obtained from Eqs. 
(18) and (19). 


mi, m 2 , ni, and n 2 can be figured out by (20). 

| mi = (zoi (i )-zqi (2»(m (i )-u ot )(m )-/(ui (2)- Uol )x 0 (i)+/(m (i)- Uol )x 0 (2) 

/(u 2 (1)-u 2 (2)) (2D 

n _ Uol(ZD,(l)-Zo 1 (2))+/(X 0 (l)-Jf 0 (2))+Ui(2)Zoi(2)-Ui(l)Zoi{l) 1 > 

1,2 - u 2 (1)”W(2) 1 

(21) shows unique solution when the cameras are fixed. In practical 
operation, r, t can be measured directly and xoi, Zoi will be gotten by 
simple geometric operations (See Appendix). 

In sum, for a fixed ranging system in which f u oi , r and t are 
known and constant, only two reference points with the parame¬ 
ters /1, / 2 , Ui and u 2 known are needed to solve for m, and n,-. With 
the identified parameters, the line distance from P to R, can be fig¬ 
ured out as well as the corresponding angle 6. The measurement of 
the parameters involved such as r, t, / and l 2 is relatively easy in 
operation. Other than the traditional calibration method, the 
parameters in this paper can be identified without certain calibra¬ 
tion board. One could identify the essential parameters by arbi¬ 
trarily selecting two points in the field of view common to the 
cameras as the reference points and measuring the line distances 
between each point and each rotation center. And the calculation 
is simple. 

3. Experimental results 


h = y/x 2 0 +Z 2 m (18) 

0 = arctan^j (19) 

Finally, x 0 and z 0 i can be figured out when/, u oi , u,-, r, t, and n,- 
are already known. Then, /, and 6 can be obtained too. 


In this section, experimental results are presented for the 
range computation. Two cameras are mounted on a pair of pan¬ 
tilt devices and they can translate horizontally and rotate left- 
right. Thus, the optical axes can be set in parallel configuration 
by rotating the pan-tilt devices. As mentioned in Section 2.3, 
the parameters can be identified through the steps shown in 
Fig. 4. As ranging is the purpose in this paper, the match of 
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Establish the 

parallel ranging Select Qj 

system and get f and Q 2 


Acquire ground- 
truth data: 

u 2 (k) (k=l,2) , 


Transfer 
li(k) and I 2 (k) to 
y(k) andjto(k) from 
(26) and (27) 


Fig. 4. The parameter identification steps. 


conjugate points in the stereo pair of images is not discussed 
here. Ui(k) and u 2 (fc) (k = 1,2) are matched manually in this exper¬ 
iment. With the identified parameters, 11 and 6 of a certain target 
can be figured out. 

In 3.1, the effect of the relative position of the target and refer¬ 
ence points on ranging accuracy is analyzed when only two refer¬ 
ence points are used to identify the parameters. In 3.2, the range 
from a moving person to R, is computed with the proposed ranging 
method. 

3.1. The relative position of the target and two reference points 

Although mi, m 2 , n 1, and n 2 can be determined once the cameras 
are fixed, there are still some factors that affect the ranging accu¬ 
racy. In this part, we will discuss the effect of the relative position 
of the target and reference points. 

In order to facilitate analysis, we divide the scene in front of two 
cameras into four regions, RA, RB, RC and RD. As shown in Fig. 5(a), 
Qi and Q2 are reference points, R, and R 2 are rotation centers of two 
cameras. Fig. 5(b) shows the experimental scheme. Objects, A1-A7 
and Ti-T 3 , are arranged in two rows. Considering the halation in 
thermal images, the center of each object is chosen as the spatial 
point in this experiment. Different combinations of reference 
points are also shown in Fig. 5(b). The information of these spatial 
points is listed in Table 1 . 

Fig. 6 illustrates the real images captured by the cameras. The 
left one is the thermal image and the right one is the visible image. 
The water temperature in the cups was about 80 °C and the room 
temperature was about 20 °C. The focal length of the thermal cam¬ 
era is 395 pixel. Suppose that the principal point is on the center of 
the image, u o1 = 160. The focal length of the visible camera is also 
set to 395 pixel and u o2 = 158. By measuring, parameters r and t 
could be gotten: r= 133 mm, t= 12 mm (the left camera was be¬ 
hind the right). Therefore, the parameters can be identified by 
the steps described before and the range of a target can be com¬ 
puted. The ranging accuracy is quantified in terms of |E X | and \E Z \, 
calculated by (22) and (23). 

E x = x oc - x ot (22) 

E z = Zoic - Zoir (23) 


Here Zon and x ot are the ground-truth data, and Zoic and x 0c are the 
computed results. In this paper, |E X | and |E Z | are measured in 
millimeter. 

When 7j, T 2 and T 3 are selected as the targets and two of A1-A7 
are selected as reference points, the target is either in region RB or 
in RC, the dashed lines shown in Fig. 5(b). When A 2 and A 5 are se¬ 
lected as the targets and two of T,-T 2 are selected as reference 
points, the target is either in region RA or in RD, the solid lines 
in Fig. 5(b). A target, to which region it will belong, RA, RB, RC or 
RD, depends on (Qi, Ch). For example, T 2 is in RC when (Q r , 
Q2) = (Ai, A 2 ) and it is in RB when (Qi, Ch) = (A 4 , A 5 ). Table 2 lists 
the ranging results of different targets computed with different 
pairs of reference points, (Qi, Q2). A comparison can be made based 
on the results. 

From Table 2, one can find that |E X | will be smaller when a cer¬ 
tain target is in RB or RA than in RC or RD in general. There is a 
similar result when different targets are computed with a certain 
pair of reference points (Qi, Q 2 ). 

For a certain pair of reference points (Qi, Qj), |£ z | is constant for 
different targets because the targets own the same value of z 0 it- For 
a certain target, |E Z | is smaller when Qi is closer to the principal 
point. 

Therefore, in order to get better ranging result of a certain target 
based on two reference points, selecting Qi and Q2 to insure the 
target being in region RA and selecting the point close to the prin¬ 
cipal point in the image plane as Qi will be better. It is also verified 
in the next experiment run in practice. 

Fig. 7 illustrates the real images captured by the parallel rang¬ 
ing system. The hot water in the bottle and the energized adapter 
are selected as the heat sources. The related parameters are 
shown as follows: r= 155 mm, t= -120 mm (the left was behind 
the right), /= 395pixel, u ol = 160, u o2 = 178. Qi and Ch are shown 
in Fig. 7. Their information is listed in Table 3. Thus the 
parameters can be figured out and the results are shown as 
follows: 

mi = -27.6 mm,m 2 = 8.1 mm,ni = 119.4 mm,n 2 = 42.5 mm. 

The targets are numbered from a to d. And their ranging results 
are listed in Table 4. 

According to Fig. 5(a) target a and b are in region RB, their val¬ 
ues of |E X | are much smaller than those of target c and d. Target c is 



The regions divided by R,, R 2 and 


Fig. 5. The regions divided by k ,, R 2 , Qi and Q2. (a) The : 
different points in Fig. 6. 


tion between the target and the 


by R,, R 2 , Q, and Q* (b)' 
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The information of spatial points in Fig. 5(b). 


Point A, A 2 A 3 At A 5 A s A 7 T, T 2 T 3 


(Ui,ti 2 ) (32,9) (80,57) (118,95) (157,134) (198,175) (239,216) (290,267) (59,39) (175,155) (259,239) 

Xot/mm -672.5 -406.5 -201.5 -1.5 199.9 398.5 688.5 -611.5 98.5 598.5 

Zoit/mm 2016 2016 2016 2016 2016 2016 2016 2323 2323 2323 



Fig. 6. Real images captured by multimodal stereo system in parallel configuration. 


of different targets computed with different pairs of reference ] 


Fig. 6. 


|E x |/mm 

|E z |/mm 

region 

|Ex|/mm 

|E z |/mm 


|E x |/mm 

|E z |/mm 



Fig. 7. Real images captured by another multimodal stereo system in parallel configuration. 


The information of two reference points in Fig. 7. 


Q,- 1,/mm l 2 /mm X 0 /mm Z 0 ,/mm u, u 2 

Q, 948 829 77.5 945 199 139 

(h 2533 2375 750 2419 281 277 


neither in region RB nor close to the principal point. So its ranging 
results are the worst both in X-axis and Z-axis. Besides, the results 
of target a and target d show that when the targets are close to the 
principal point in the image plane, |E Z | of the one close to the 
camera in Z-axis is smaller. One can infer that the longer the dis¬ 
tance from the target to the camera in Z-axis is, the less pixels of 
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: of different targets in Fig. 7. 


target (ui,u 2 ) Xor/mm zoit/mm |£<|/mm |£ z |/mm 


a (224,212) 356 

b (250,242) 521 

c (56,18) -460 

d (117,89) -269 


2334 12.8 165 

2349 1.2 55.4 

1590 70 198.7 

1890 21.6 9.6 


the target there are on the image and the target’s image on the 
thermal image plane is fainter. It makes the mismatch severer. 
So the error of the range computation is larger. 

3.2. Experimental result of a moving target 

In this section, the range from a moving person to the left rota¬ 
tion center of the pan-tilt device 2-D image sequences were cap¬ 
tured by the multimodal stereo ranging system as the people 
moved throughout the indoor environment. Part of the image se¬ 
quences are shown in Fig. 8. Fig. 8(al) to (el) are thermal images 
and Fig. 8(a2) to (e2) are visible images. The person is 178 cm in 
height. The room temperature was about 20 °C. As the head is hot¬ 
ter than the background, it is brighter in thermal images. The cen¬ 
ter of the head is selected as the target point and it is extracted 
through background subtraction and thresholding segmentation. 
In order to determine the accuracy of the range computed for 
tracking, the scene was calibrated too. This was done by having a 
person walk around the test-bed area, stopping at preset locations 
in the scene. At each location the range from the person’s head to 
R, was measured to be used as ground truth. In this environment, 
one could get the values of these parameters after measurement: 
r = 243 mm, t = 65 mm. The range is computed based on the 
parameters identified in last section. The experimental results 
are listed in Table 5. The ranging results are quantified in terms 
of relative error E, and E„ in this part: 


hcl/ht X 100% 

(24) 

0 c |/|0 t | x 100% 

(25) 


Here l lt and 0 t are the ground-truth data, and l lc and 6 C are the com¬ 
puted results. 

Both average relative errors of the line distances are less than 
8% and both average relative errors of the angle are less than 1%. 


The accuracy of the proposed method can basically meet the de¬ 
mand of object tracking. Table 5 shows that the ground truth l u 
of the person stopped at different locations had the relation as fol¬ 
lows: h t(e) < Zit(a) < lit(d) < lit(b) < lit(c). And the results computed 
with the parameters identified by two reference points show that 
£ i (e)<£ 1 (a)<£,(d)<£,(b)<£ l (c) and £ e (e)<£ 0 (a)<£ 0 (d)<£ fl (b)< 
£ fl (c). That is to say as the distance between the person and the 
camera increases, the relative errors of the range and the angle 
are enlarged. As analyzed in Section 3.1, the longer the distance 
from the target to the camera is, the less pixels of the target 
there are on the image and the target’s image on the image plane 
is fainter. So the error is larger. In addition, the corresponding 
points were matched manually so the corresponding points 
could not be matched precisely. The errors in this experiment 
were greatly caused by the mismatch. It will be improved in the 
future. 

The distances from the reference points to R, were manually 
measured by a metric stick in the experiment. The accuracy of 
parameters calibration is greatly dependent on the measurement 
accuracy. More reference points can be used in calibration for 
robustness. The calibration error caused by the measurement error 
could be partly reduced by increasing the measurement time and 
then the ranging accuracy will be improved. If there are more than 
two reference points, Eq. (20) will be extended to overdetermined 
linear equations and optimized iterative algorithm must be used to 
search for the optimum solutions of m i, Hi, m 2 and n 2 . Thus the 
computational cost will be increased. Three points in Fig. 7 are se¬ 
lected as the reference points to identify the unknown parameters 
in this experiment and the Levenberg-Marquardt (LM) method is 
used. The results are shown as follows:!^ = -33.3 mm, 
m 2 = 4.1 mm, ni = 83.7 mm, n 2 = 58.5 mm. 

Then the range between the moving person and the camera can 
be computed based on this unit of identified parameters. The re¬ 
sults are shown in Table 5 too. 

The results show that the accuracy is increased compared with 
the range computed with the parameters identified by two refer¬ 
ence points. However, the accuracy is only slightly improved. 
Although the ranging accuracy of (b) to (d) is improved, the rang¬ 
ing accuracy of (a) and (e) is degraded. It is because that the mea¬ 
surement error caused by the measure method could not be 
eliminated by adding the measurement time. So it is very impor¬ 
tant to improve the measurement accuracy. It appears that two ref¬ 
erence points are enough when great accuracy it not needed. 




(a2) (b2) (c2) (d2) (e2) 


Fig. 8. The moving person captured by multimodal stereo system in parallel configuration, (at) The thermal image captured at the first location, (bl) The thermal image 
captured at the second location, (cl) The thermal image captured at the third location, (dl) The thermal image captured at the fourth location, (el) The thermal image 
captured at the fifth location. (a2) The visible image captured at the first location. (b2) The visible image captured at the second location. (c2) The visible image captured at the 
third location. (d2) The visible image captured at the fourth location. (e2) The visible image captured at the fifth location. 
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?d with the parameters identified by 


lit/mm fl t /rad l, c /mm 


(256,235) 2640.3 

(238,221) 3423.9 

(225,214) 4213.7 

(191.174) 3375.9 

(196.174) 2577.8 


1.340 2764.5 

1.393 3068.7 

1.427 3709.9 

1.512 3035.2 

1.493 2643.4 

I I 


3186.9 

3832.7 

3083.7 



In addition, only linear model is discussed in the proposed 
method. The lens distortions are ignored here. The principle point 
of the thermal camera is not precisely calibrated. And there is hala¬ 
tion in thermal images. All these factors mentioned above have 
contributed to the calibration error, therefore increasing the rang¬ 
ing errors. 

A ranging system composed of a thermal camera and a visible 
camera arranged in parallel configuration is built in the experi¬ 
ment. It is used to compute the range between different target 
points and the camera in different scenes by the proposed method. 
It can be easily operated. For a certain target, the effect of the rel¬ 
ative position of the target and reference points is analyzed when 
only two reference points are used to identify the parameters. The 
subsequent experiments are carried out based on the analysis. Fur¬ 
thermore. the calibrated stereo system can be used in the scene 
that is different from the calibration scene. The average relative er¬ 
ror of the range computation with the proposed method is less 
than 8%. Considering that most of the distance parameters are 
manually measured in the proposed method and the thermal 
images can not be as clear as the visible images, the performance 
is acceptable for object tracking. 


and in an efficient way. Experiments with real images proved the 
validity of the method. 
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Appendix A 

In practical operation, r, t can be measured directly. However, 
the measurement of x 0 i and z 0 i is more complicated, and bigger er¬ 
rors might be made during the measurement. By contrast, the line 
distances from Q, (Qj) to each rotation center (/, and 1 2 ) are easier 
to be measured. As shown in Fig. 9, the geometric relationship be¬ 
tween the line distances and x 0 i or z 0 \ is certain. So x 0 and z 0 1 can 
be figured out by Eqs. (26) and (27). 


v 1 ( ( l 1 + t 2 + f2 “ . 

X 0 = U cos arcos - . 2 - arctan 


©) 

©) 


(26) 

(27) 


Let AR, be the vertical distance between R, and R 2 . Let AR 2 be 
the horizontal distance between R, and R 2 . AR, = t (signed) and 
AR 2 = r. When R 2 is behind Ri, t > 0. Let QE be the vertical distance 
between R, and Q. Let R,E be the horizontal distance between R, 
and Q. QE = z 0 \ and RiE = x 0 . Let QR, be the line distance between 
R, and Q, Let QR 2 be the line distance between R 2 and Q. QR, = l, 
and QR 2 = 1 2 . The parameters below are to be worked out: 
QE = Zoi, RiE = x 0 . 

According to the Pythagorean Theorem one can get: 
RjR 2 = AR, 2 + AR 2 2 = t 2 + r 2 
v tan ZR,R 2 A = AR, /AR 2 = t/r 


4. Conclusions 

As the thermal camera is different from the visible camera, a 
ranging method based on the rotation center of the pan-tilt device 
was proposed in this paper. The range can be computed by the pro¬ 
posed method in a stereo system composed of a thermal camera 
and a visible camera arranged in parallel configuration. It will con¬ 
tribute to precise object tracking and locating in all weather condi¬ 
tions. In addition, a simple, fast and reliable method for 
determining the parameters required for range computation pro¬ 
posed in this paper was presented too. The information of the ref¬ 
erence points used in calibration can be measured easily. The 
method is able to be easily implemented with a pair of visible- 
thermal cameras on most computer systems at a very low cost 


:.ZDR,R 2 = ZR, R 2 A = arctan 


© 


According to the law of cosines: cos ZQR,R 2 

_ QR 2 + R|Rj - QR 2 2 

“ 2 QHt ■ RiR 2 

l 2 + t 2 + r 2 -ll 


21, ■ Vt 2 +1* 


(l 2 + t 2 + r 2 -1 2 \ 

























ZQR^D = ZQR,R 2 - ZDR,R 2 







